Discovering frequent pattern pairs

نویسندگان

  • Carlos Ordonez
  • Zhibo Chen
چکیده

Cubes and association rules discover frequent patterns in a data set, most of which are not significant. Thus previous research has introduced search constraints and statistical metrics to discover significant patterns and reduce processing time. We introduce cube pairs (comparing cube groups based on a parametric statistical test) and rule pairs (based on two similar association rules), which are pattern pair generalizations of cubes and association rules, respectively. We introduce algorithmic optimizations to discover comparable pattern sets. We carefully study why both techniques agree or disagree on the validity of specific pairs, considering p-value for statistical tests, as well as confidence for association rules. In addition, we analyze the probabilistic distribution of target attributes given confidence thresholds. We also introduce a reliability metric based on cross-validation, which enables an objective comparison between both patterns. We present an extensive experimental evaluation with real data sets to understand significance and reliability of pattern pairs. We show cube pairs generally produce more reliable results than rule pairs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Frequent Pattern Mining in Web Log Data

Frequent pattern mining is a heavily researched area in the field of data mining with wide range of applications. One of them is to use frequent pattern discovery methods in Web log data. Discovering hidden information from Web log data is called Web usage mining. The aim of discovering frequent patterns in Web log data is to obtain information about the navigational behavior of the users. This...

متن کامل

Automata Theory Approach for Solving Frequent Pattern Discovery Problems

The various types of frequent pattern discovery problem, namely, the frequent itemset, sequence and graph mining problems are solved in different ways which are, however, in certain aspects similar. The main approach of discovering such patterns can be classified into two main classes, namely, in the class of the levelwise methods and in that of the database projection-based methods. The level-...

متن کامل

Mining of Users’ Access Behaviour for Frequent Sequential Pattern from Web Logs

Sequential Pattern mining is the process of applying data mining techniques to a sequential database for the purposes of discovering the correlation relationships that exist among an ordered list of events. The task of discovering frequent sequences is challenging, because the algorithm needs to process a combinatorially explosive number of possible sequences. Discovering hidden information fro...

متن کامل

Discovering Domains Mediating Protein Interactions

Background: Protein-protein interactions do not provide any direct information re‌garding the domains within the proteins that mediate the interactions. The majority of proteins are multi domain proteins and the interaction between them is often defined by the pairs of their domains. Most of the former studies focus only on interacting do‌main pairs. However they do not consider the in...

متن کامل

Discovering partial periodic-frequent patterns in a transactional database

Time and frequency are two important dimensions to determine the interestingness of a pattern in a database. Periodic-frequent patterns are an important class of regularities that exist in a database with respect to these two dimensions. Current studies on periodic-frequent pattern mining have focused on discovering full periodic-frequent patterns, i.e., finding all frequent patterns that have ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Intell. Data Anal.

دوره 17  شماره 

صفحات  -

تاریخ انتشار 2013